10 research outputs found

    Getting More out of Large Language Models for Proofs

    Full text link
    Large language models have the potential to simplify formal theorem proving and make it more accessible. But how to get the most out of these models is still an open question. To answer this question, we take a step back and explore the failure cases of these models using common prompting-based techniques. Our talk will discuss these failure cases and what they can teach us about how to get more out of these models

    Ornaments for Proof Reuse in Coq

    Get PDF
    Ornaments express relations between inductive types with the same inductive structure. We implement fully automatic proof reuse for a particular class of ornaments in a Coq plugin, and show how such a tool can give programmers the rewards of using indexed inductive types while automating away many of the costs. The plugin works directly on Coq code; it is the first ornamentation tool for a non-embedded dependently typed language. It is also the first tool to automatically identify ornaments: To lift a function or proof, the user must provide only the source type, the destination type, and the source function or proof. In taking advantage of the mathematical properties of ornaments, our approach produces faster functions and smaller terms than a more general approach to proof reuse in Coq

    Proof Repair Infrastructure for Supervised Models: Building a Large Proof Repair Dataset

    Get PDF
    We report on our efforts building a new, large proof-repair dataset and benchmark suite for the Coq proof assistant. The dataset is made up of Git commits from open-source projects with old and new versions of definitions and proofs aligned across commits. Building this dataset has been a significant undertaking, highlighting a number of challenges and gaps in existing infrastructure. We discuss these challenges and gaps, and we provide recommendations for how the proof assistant community can address them. Our hope is to make it easier to build datasets and benchmark suites so that machine-learning tools for proofs will move to target the tasks that matter most and do so equitably across proof assistants

    Long-Term Mentoring for Computer Science Researchers

    Full text link
    Early in the pandemic, we -- leaders in the research areas of programming languages (PL) and computer architecture (CA) -- realized that we had a problem: the only way to form new lasting connections in the community was to already have lasting connections in the community. Both of our academic communities had wonderful short-term mentoring programs to address this problem, but it was clear that we needed long-term mentoring programs. Those of us in CA approached this scientifically, making an evidence-backed case for community-wide long-term mentoring. In the meantime, one of us in PL had impulsively launched an unofficial long-term mentoring program, founded on chaos and spreadsheets. In January 2021, the latter grew to an official cross-institutional long-term mentoring program called SIGPLAN-M; in January 2022, the former grew to Computer Architecture Long-term Mentoring (CALM). The impacts have been strong: SIGPLAN-M reaches 328 mentees and 234 mentors across 41 countries, and mentees have described it as "life changing" and "a career saver." And while CALM is in its pilot phase -- with 13 mentors and 21 mentees across 7 countries -- it has received very positive feedback. The leaders of SIGPLAN-M and CALM shared our designs, impacts, and challenges along the way. Now, we wish to share those with you. We hope this will kick-start a larger long-term mentoring effort across all of computer science

    Passport: Improving Automated Formal Verification Using Identifiers

    Full text link
    Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. Tools that automate formal verification, by learning from proof corpora to suggest proofs, have just begun to show their promise. These tools are effective because of the richness of the data the proof corpora contain. This richness comes from the stylistic conventions followed by communities of proof developers, together with the logical systems beneath proof assistants. However, this richness remains underexploited, with most work thus far focusing on architecture rather than making the most of the proof data. In this paper, we develop Passport, a fully-automated proof-synthesis tool that systematically explores how to most effectively exploit one aspect of that proof data: identifiers. Passport enriches a predictive Coq model with three new encoding mechanisms for identifiers: category vocabulary indexing, subword sequence modeling, and path elaboration. We compare Passport to three existing base tools which Passport can enhance: ASTactic, Tac, and Tok. In head-to-head comparisons, Passport automatically proves 29% more theorems than the best-performing of these base tools. Combining the three Passport-enhanced tools automatically proves 38% more theorems than the three base tools together, without Passport's enhancements. Finally, together, these base tools and Passport-enhanced tools prove 45% more theorems than the combined base tools without Passport's enhancements. Overall, our findings suggest that modeling identifiers can play a significant role in improving proof synthesis, leading to higher-quality software

    Proof Repair

    No full text
    Thesis (Ph.D.)--University of Washington, 2021The days of verifying only toy programs are long gone. The last two decades have marked a new era of verification at scale, bringing strong guarantees to large and critical systems—an era of proof engineering. Proof engineering is for verified systems what software engineering is for unverified systems. Still, while proof engineering—like software engineering—is about both development and maintenance, most proof engineering technologies so far have focused on development. Whenit comes to maintaining these systems, proof engineering is decades behind software engineering. This thesis introduces proof repair: a new approach to maintaining verified systems. Proof repair reimagines the automation proofengineers typically use to interactively guide tools to search for a machine-checked proof. When a system changes and this breaks a proof about the system, traditional automation searches for the fixed proof from scratch. Proof repair, in contrast, is change-aware automation: it determines how the system has changed, and uses that information to help fix the broken proof. Proof repair in this thesis works by combining semantic differencing algorithms with program transformations. Importantly, both differencing and the transformations operate over low-level representations of proofs called proof terms. Thanks to the richness of these proof terms, differencing and the transformations can leverage new and existing results in dependent type theory. For example, one transformation externalizes univalent transport from homotopy type theory, leveraging novel transformations over equalities to make this possible. This approach is realized inside of a proof repair tool suite for the Coq proof assistant. Case studies show both retroactively and by live use that this proof repair tool suite can save work for proof engineers on real proof developments

    QED at large: a survey of engineering of formally verified software

    No full text
    This monograph provides the reader with an insightful overview of the work that has led to modern-day techniques for formally verifying software. In times of increasing automation, this underpins many software systems so future trends are also highlighted

    White matter hyperintensities correlate to cognition and fiber tract integrity in older adults with HIV

    No full text
    Our aim was to examine the clinical relevance of white matter hyperintensities (WMH) in HIV. We used an automated approach to quantify WMH volume in HIV seropositive (HIV+; n = 65) and HIV seronegative (HIV-; n = 29) adults over age 60. We compared WMH volumes between HIV+ and HIV- groups in cross-sectional and multiple time-point analyses. We also assessed correlations between WMH volumes and cardiovascular, HIV severity, cognitive scores, and diffusion tensor imaging variables. Serostatus groups did not differ in WMH volume, but HIV+ participants had less cerebral white matter (mean: 470.95 [43.24] vs. 497.63 [49.42] mL, p = 0.010). The distribution of WMH volume was skewed in HIV+ with a high proportion (23%) falling above the 95th percentile of WMH volume defined by the HIV- group. Serostatus groups had similar amount of WMH volume growth over time. Total WMH volume directly correlated with measures of hypertension and inversely correlated with measures of global cognition, particularly in executive functioning, and psychomotor speed. Greater WMH volume was associated with poorer brain integrity measured from diffusion tensor imaging (DTI) in the corpus callosum and sagittal stratum. In this group of HIV+ individuals over 60, WMH burden was associated with cardiovascular risk and both worse diffusion MRI and cognition. The median total burden did not differ by serostatus; however, a subset of HIV+ individuals had high WMH burden
    corecore